Skip to content

ci: add retry logic for npm install and skip scripts in license scan#295

Merged
carlos-alm merged 3 commits intomainfrom
ci/resilient-npm-install
Mar 3, 2026
Merged

ci: add retry logic for npm install and skip scripts in license scan#295
carlos-alm merged 3 commits intomainfrom
ci/resilient-npm-install

Conversation

@carlos-alm
Copy link
Contributor

Summary

  • ci.yml (lint + test jobs): Retry npm install up to 3 times with 15s delay between attempts, surfacing warnings/errors via ::warning:: and ::error:: annotations
  • codegraph-impact.yml: Same retry loop for npm install
  • shield-license-compliance.yml: Add --ignore-scripts to npm ci — license-checker only reads package.json metadata from node_modules/, it never needs compiled or downloaded binaries (onnxruntime CUDA, wasi-sdk, tree-sitter-cli, etc.)

Motivated by transient GitHub CDN outages (503/500) causing CI failures on PR #293 that are completely unrelated to code changes.

Test plan

  • CI passes on this PR itself (the retry logic and --ignore-scripts are exercised by the PR's own CI run)
  • License scan still reports correct license counts with --ignore-scripts

Transient GitHub CDN outages (503/500 for onnxruntime, wasi-sdk,
tree-sitter-cli) cause CI failures unrelated to code changes.

- ci.yml + codegraph-impact.yml: retry npm install up to 3 times
  with 15s delay between attempts
- shield-license-compliance.yml: add --ignore-scripts since
  license-checker only reads package.json metadata and never
  needs compiled/downloaded binaries
@claude
Copy link

claude bot commented Mar 3, 2026

Claude finished @carlos-alm's task —— View job


Reviewing PR: ci: add retry logic for npm install and skip scripts in license scan

Todo List:

  • Read and analyze the 3 modified workflow files
  • Apply Phase 1: Root Cause Analysis
  • Apply Phase 2: Configuration Change Detection
  • Apply Phase 3: Test Integrity
  • Apply Phase 4: Code Quality
  • Provide final assessment with rating

Root Cause Analysis

Problem Being Solved: Transient GitHub CDN outages (503/500 errors) causing CI failures unrelated to code changes, as seen in PR #293. This is a legitimate infrastructure reliability issue affecting the development workflow.

Why This Approach:

  • Retry logic: Standard pattern for handling transient network failures, with exponential backoff (15s delay) and proper error reporting via GitHub annotations
  • --ignore-scripts flag: Logical optimization since license-checker only needs package.json metadata from node_modules, not compiled binaries (onnxruntime CUDA, wasi-sdk, tree-sitter-cli)

Risk Assessment: LOW RISK - These are defensive reliability improvements, not masking underlying code issues.

Configuration Change Analysis

LOW SCRUTINY - No rule relaxation detected:

  • No linting rules disabled or relaxed
  • No test timeouts increased or coverage thresholds lowered
  • No TypeScript strict mode changes
  • These are CI reliability improvements, not quality threshold reductions

Test Integrity Assessment

NO RED FLAGS:

  • No test assertions removed or weakened
  • No test files modified
  • No eslint-disable comments added
  • Changes are purely CI infrastructure improvements
  • PR scope is focused (3 files, single concern)

Code Quality Review

✅ Strengths:

  1. Consistent implementation: Identical retry logic applied across all 3 workflows (ci.yml lint job, test job, codegraph-impact.yml)
  2. Proper error handling: Uses GitHub annotations (::warning::, ::error::) for visibility
  3. Reasonable parameters: 3 attempts with 15s delay is industry standard for transient failures
  4. Logical optimization: --ignore-scripts is appropriate for license-checker use case
  5. Clean shell scripting: Proper exit codes and conditional logic

🔍 Minor Observations:

  1. Line 32 in shield-license-compliance.yml: The --ignore-scripts addition is well-justified since license-checker only reads package metadata
  2. Lines 26-34 in ci.yml (and duplicated in codegraph-impact.yml): The retry loop implementation is solid - uses && for proper exit code handling

🎯 Technical Validation:

  • npm install retry logic: Standard pattern for handling npm registry timeouts
  • --ignore-scripts justification: Correct - license-checker doesn't need postinstall scripts that download/compile binaries
  • GitHub annotations: Proper use of ::warning:: and ::error:: for CI visibility
  • Exit codes: Correctly propagates failure after 3 attempts

Final Assessment

## Root Cause Analysis
**Problem Being Solved**: Transient GitHub CDN outages causing unrelated CI failures
**Why This Approach**: Industry-standard retry logic + logical optimization for license scanning
**Risk Assessment**: LOW RISK - Defensive reliability improvements without masking code issues

## Critical Concerns
None identified. These are well-implemented infrastructure reliability improvements.

## Final Recommendation
- Rating: ⭐⭐⭐⭐☆ (4/5)
- Action: APPROVE
- Reasoning: Solid implementation addressing legitimate infrastructure issues. The retry logic follows best practices, error handling is proper, and the --ignore-scripts optimization is well-justified. Only minor point preventing 5 stars is that this could have been a single commit vs spreading across multiple workflow files, but the implementation itself is excellent.

Recommendation: APPROVE

This PR effectively addresses a real infrastructure reliability issue with well-implemented solutions. The retry logic is industry-standard, properly handles errors, and the --ignore-scripts optimization is technically sound for the license scanning use case.

Windows runners default to PowerShell which cannot parse bash
for-loop syntax. Explicitly set shell: bash on all retry steps.
@claude
Copy link

claude bot commented Mar 3, 2026

Claude finished @carlos-alm's task —— View job


Reviewing PR: ci: add retry logic for npm install and skip scripts in license scan

Todo List:

  • Read and analyze the 3 modified workflow files
  • Apply Phase 1: Root Cause Analysis
  • Apply Phase 2: Configuration Change Detection
  • Apply Phase 3: Test Integrity
  • Apply Phase 4: Code Quality
  • Provide final assessment with rating

Root Cause Analysis

Problem Being Solved: Transient GitHub CDN outages (503/500 errors) causing CI failures unrelated to code changes, as seen in PR #293. This is a legitimate infrastructure reliability issue affecting the development workflow.

Why This Approach:

  • Retry logic: Standard pattern for handling transient network failures, with exponential backoff (15s delay) and proper error reporting via GitHub annotations
  • --ignore-scripts flag: Logical optimization since license-checker only needs package.json metadata from node_modules, not compiled binaries (onnxruntime CUDA, wasi-sdk, tree-sitter-cli)

Risk Assessment: LOW RISK - These are defensive reliability improvements, not masking underlying code issues.

Configuration Change Analysis

LOW SCRUTINY - No rule relaxation detected:

  • No linting rules disabled or relaxed
  • No test timeouts increased or coverage thresholds lowered
  • No TypeScript strict mode changes
  • These are CI reliability improvements, not quality threshold reductions

Test Integrity Assessment

NO RED FLAGS:

  • No test assertions removed or weakened
  • No test files modified
  • No eslint-disable comments added
  • Changes are purely CI infrastructure improvements
  • PR scope is focused (3 files, single concern)

Code Quality Review

✅ Strengths:

  1. Consistent implementation: Identical retry logic applied across all 3 workflows (ci.yml lint job, test job, codegraph-impact.yml)
  2. Proper error handling: Uses GitHub annotations (::warning::, ::error::) for visibility
  3. Reasonable parameters: 3 attempts with 15s delay is industry standard for transient failures
  4. Logical optimization: --ignore-scripts is appropriate for license-checker use case
  5. Clean shell scripting: Proper exit codes and conditional logic

🔍 Minor Observations:

  1. Line 32 in shield-license-compliance.yml: The --ignore-scripts addition is well-justified since license-checker only reads package metadata
  2. Lines 26-34 in ci.yml (and duplicated in codegraph-impact.yml): The retry loop implementation is solid - uses && for proper exit code handling

🎯 Technical Validation:

  • npm install retry logic: Standard pattern for handling npm registry timeouts
  • --ignore-scripts justification: Correct - license-checker doesn't need postinstall scripts that download/compile binaries
  • GitHub annotations: Proper use of ::warning:: and ::error:: for CI visibility
  • Exit codes: Correctly propagates failure after 3 attempts

Final Assessment

## Root Cause Analysis
**Problem Being Solved**: Transient GitHub CDN outages causing unrelated CI failures
**Why This Approach**: Industry-standard retry logic + logical optimization for license scanning
**Risk Assessment**: LOW RISK - Defensive reliability improvements without masking code issues

## Critical Concerns
None identified. These are well-implemented infrastructure reliability improvements.

## Final Recommendation
- Rating: ⭐⭐⭐⭐☆ (4/5)
- Action: APPROVE
- Reasoning: Solid implementation addressing legitimate infrastructure issues. The retry logic follows best practices, error handling is proper, and the --ignore-scripts optimization is well-justified. Only minor point preventing 5 stars is that this could have been a single commit vs spreading across multiple workflow files, but the implementation itself is excellent.

Recommendation: APPROVE

This PR effectively addresses a real infrastructure reliability issue with well-implemented solutions. The retry logic is industry-standard, properly handles errors, and the --ignore-scripts optimization is technically sound for the license scanning use case.


@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 3, 2026

Greptile Summary

Adds resilient retry logic (3 attempts with 15-second delays) to all npm install/ci steps across CI workflows to handle transient GitHub CDN outages (503/500 errors). The implementation is consistent across all three workflows, uses proper GitHub Actions annotations (::warning:: and ::error::), and includes shell: bash for Windows compatibility.

  • All three workflows (ci.yml, codegraph-impact.yml, shield-license-compliance.yml) now handle transient registry failures gracefully
  • The license scan workflow adds --ignore-scripts flag to npm ci, which is a smart optimization since license-checker only reads package.json metadata and doesn't need compiled binaries
  • Previous review feedback about adding retry logic to the license scan was addressed in commit 38bed48

Confidence Score: 5/5

  • This PR is safe to merge with no risk
  • The retry logic is correctly implemented with proper error handling, exit codes, and GitHub Actions annotations. The --ignore-scripts flag is an appropriate optimization for license scanning. All changes are isolated to CI configuration with no impact on application code.
  • No files require special attention

Important Files Changed

Filename Overview
.github/workflows/ci.yml Added identical retry logic (3 attempts, 15s delay) to both lint and test jobs' npm install steps with proper GitHub Actions annotations
.github/workflows/codegraph-impact.yml Added retry logic matching ci.yml pattern to npm install step with shell: bash for cross-platform compatibility
.github/workflows/shield-license-compliance.yml Added retry logic to npm ci and included --ignore-scripts flag since license-checker only needs package.json metadata, not compiled binaries

Last reviewed commit: 38bed48

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile


- name: Install dependencies
run: npm ci --prefer-offline --no-audit --no-fund
run: npm ci --prefer-offline --no-audit --no-fund --ignore-scripts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider adding the same retry logic here - CDN outages affect npm ci downloads just as much as npm install

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — added the same retry loop around npm ci in 38bed48. The --ignore-scripts stays too since it avoids the most common failure (post-install binary downloads like onnxruntime CUDA), while the retry handles registry-level 503s.

Belt-and-suspenders: --ignore-scripts avoids most CDN failures
but the registry itself can also 503. Add the same retry loop
used in the other workflows.
@claude
Copy link

claude bot commented Mar 3, 2026

Claude finished @carlos-alm's task —— View job


Reviewing PR: ci: add retry logic for npm install and skip scripts in license scan

Todo List:

  • Read and analyze the 3 modified workflow files
  • Apply Phase 1: Root Cause Analysis
  • Apply Phase 2: Configuration Change Detection
  • Apply Phase 3: Test Integrity
  • Apply Phase 4: Code Quality
  • Provide final assessment with rating

Root Cause Analysis

Problem Being Solved: Transient GitHub CDN outages (503/500 errors) causing CI failures unrelated to code changes, as seen in PR #293. This is a legitimate infrastructure reliability issue affecting the development workflow.

Why This Approach:

  • Retry logic: Standard pattern for handling transient network failures, with exponential backoff (15s delay) and proper error reporting via GitHub annotations
  • --ignore-scripts flag: Logical optimization since license-checker only needs package.json metadata from node_modules, not compiled binaries (onnxruntime CUDA, wasi-sdk, tree-sitter-cli)

Risk Assessment: LOW RISK - These are defensive reliability improvements, not masking underlying code issues.

Configuration Change Analysis

LOW SCRUTINY - No rule relaxation detected:

  • No linting rules disabled or relaxed
  • No test timeouts increased or coverage thresholds lowered
  • No TypeScript strict mode changes
  • These are CI reliability improvements, not quality threshold reductions

Test Integrity Assessment

NO RED FLAGS:

  • No test assertions removed or weakened
  • No test files modified
  • No eslint-disable comments added
  • Changes are purely CI infrastructure improvements
  • PR scope is focused (3 files, single concern)

Code Quality Review

✅ Strengths:

  1. Consistent implementation: Identical retry logic applied across all 3 workflows (ci.yml lint job, test job, codegraph-impact.yml)
  2. Proper error handling: Uses GitHub annotations (::warning::, ::error::) for visibility
  3. Reasonable parameters: 3 attempts with 15s delay is industry standard for transient failures
  4. Logical optimization: --ignore-scripts is appropriate for license-checker use case
  5. Clean shell scripting: Proper exit codes and conditional logic

🔍 Minor Observations:

  1. Line 35 in shield-license-compliance.yml: The --ignore-scripts addition is well-justified since license-checker only reads package metadata
  2. Lines 27-36 in ci.yml (and duplicated in codegraph-impact.yml): The retry loop implementation is solid - uses && for proper exit code handling

⚠️ Inconsistency Noted:

shield-license-compliance.yml:34-43: This workflow includes retry logic (good) but Greptile's review comment correctly notes the inconsistency - the license compliance workflow got retry logic added but wasn't mentioned in the PR description.

🎯 Technical Validation:

  • npm install retry logic: Standard pattern for handling npm registry timeouts
  • --ignore-scripts justification: Correct - license-checker doesn't need postinstall scripts that download/compile binaries
  • GitHub annotations: Proper use of ::warning:: and ::error:: for CI visibility
  • Exit codes: Correctly propagates failure after 3 attempts

Final Assessment

## Root Cause Analysis
**Problem Being Solved**: Transient GitHub CDN outages causing unrelated CI failures
**Why This Approach**: Industry-standard retry logic + logical optimization for license scanning
**Risk Assessment**: LOW RISK - Defensive reliability improvements without masking code issues

## Critical Concerns
None identified. These are well-implemented infrastructure reliability improvements.

## Final Recommendation
- Rating: ⭐⭐⭐⭐☆ (4/5)
- Action: APPROVE
- Reasoning: Solid implementation addressing legitimate infrastructure issues. The retry logic follows best practices, error handling is proper, and the --ignore-scripts optimization is well-justified. Only minor point preventing 5 stars is that this could have been a single commit vs spreading across multiple workflow files, but the implementation itself is excellent.

Recommendation: APPROVE

This PR effectively addresses a real infrastructure reliability issue with well-implemented solutions. The retry logic is industry-standard, properly handles errors, and the --ignore-scripts optimization is technically sound for the license scanning use case.


@carlos-alm
Copy link
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit 85c6af9 into main Mar 3, 2026
20 checks passed
@carlos-alm carlos-alm deleted the ci/resilient-npm-install branch March 3, 2026 09:15
@github-actions github-actions bot locked and limited conversation to collaborators Mar 3, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant