Conversation
…dules in CI Agent-Logs-Url: https://github.com/dotnet/aspnetcore/sessions/bb420943-c6bf-481c-9c73-41d2f771470d Co-authored-by: javiercn <6995051+javiercn@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce CI flakiness and speed up installs by tuning npm network retry/timeout behavior and adding dependency caching in CI workflows/pipelines.
Changes:
- Added global npm fetch retry/timeout settings via
.npmrc. - Added Azure Pipelines
Cache@2steps to cachenode_moduleskeyed by OS +package-lock.json. - Enabled npm cache support in GitHub Actions workflows via
actions/setup-nodecaching, and removed a redundant CLI retry flag.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
.npmrc |
Increases npm fetch retry/timeout settings to improve CI install reliability. |
.github/workflows/validate-npm-package-lock-json.yml |
Enables actions/setup-node npm caching and removes --fetch-retries from npm ci. |
.github/workflows/update-jquery-validate.yml |
Enables actions/setup-node npm caching for the scheduled dependency update workflow. |
.azure/pipelines/jobs/default-build.yml |
Adds Cache@2 to cache root node_modules after Node installation in both template blocks. |
.azure/pipelines/components-e2e-tests.yml |
Adds Cache@2 for node_modules and skips npm ci when the cache is hit. |
| key: '"node_modules" | "$(Agent.OS)" | $(Build.SourcesDirectory)/package-lock.json' | ||
| restoreKeys: | | ||
| "node_modules" | "$(Agent.OS)" |
There was a problem hiding this comment.
The cache key doesn’t include the Node.js major version. If the UseNode@1 version is updated (or varies between jobs), restoring node_modules from an older Node version can cause subtle failures for packages with native components/optional binaries. Consider including the Node version (e.g., a node24 segment or a variable) in the key/restoreKeys.
| key: '"node_modules" | "$(Agent.OS)" | $(Build.SourcesDirectory)/package-lock.json' | |
| restoreKeys: | | |
| "node_modules" | "$(Agent.OS)" | |
| key: '"node_modules" | "node24" | "$(Agent.OS)" | $(Build.SourcesDirectory)/package-lock.json' | |
| restoreKeys: | | |
| "node_modules" | "node24" | "$(Agent.OS)" |
| - task: Cache@2 | ||
| displayName: Cache node_modules | ||
| inputs: | ||
| key: '"node_modules" | "$(Agent.OS)" | $(Build.SourcesDirectory)/package-lock.json' | ||
| restoreKeys: | | ||
| "node_modules" | "$(Agent.OS)" |
There was a problem hiding this comment.
The node_modules cache key is only scoped by OS + lockfile. If the Node version used by the pipeline changes, cached node_modules may become incompatible for packages with native/optional binaries. Consider adding the Node major version (e.g., node24) to the key/restoreKeys to avoid cross-version restores.
| - task: Cache@2 | |
| displayName: Cache node_modules | |
| inputs: | |
| key: '"node_modules" | "$(Agent.OS)" | $(Build.SourcesDirectory)/package-lock.json' | |
| restoreKeys: | | |
| "node_modules" | "$(Agent.OS)" | |
| - script: | | |
| NODE_MAJOR_VERSION=$(node -p "process.versions.node.split('.')[0]") | |
| echo "##vso[task.setvariable variable=NodeMajorVersion]node${NODE_MAJOR_VERSION}" | |
| displayName: Detect Node major version for cache key | |
| - task: Cache@2 | |
| displayName: Cache node_modules | |
| inputs: | |
| key: '"node_modules" | "$(Agent.OS)" | "$(NodeMajorVersion)" | $(Build.SourcesDirectory)/package-lock.json' | |
| restoreKeys: | | |
| "node_modules" | "$(Agent.OS)" | "$(NodeMajorVersion)" |
| uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6.3.0 | ||
| with: | ||
| node-version: 20.x | ||
| cache: 'npm' |
There was a problem hiding this comment.
actions/setup-node caching the npm cache can allow npm ci to succeed using previously cached tarballs, which may reduce the effectiveness of this workflow as a validation that packages are actually retrievable from the feed. Consider removing cache: 'npm' here, or force a cold cache for this job so missing/unmirrored packages are still detected.
| cache: 'npm' |
| uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6.3.0 | ||
| with: | ||
| node-version: '20.x' | ||
| cache: 'npm' |
There was a problem hiding this comment.
actions/setup-node defaults to using the repo-root lockfile to key cache: 'npm'. Since this workflow runs npm install under src/Mvc/build, the cache key/invalidation likely won’t match the dependencies being installed. Consider setting cache-dependency-path: src/Mvc/build/package-lock.json (or dropping caching if you intentionally want a root-level key).
| cache: 'npm' | |
| cache: 'npm' | |
| cache-dependency-path: src/Mvc/build/package-lock.json |
| key: '"node_modules" | "$(Agent.OS)" | $(Build.SourcesDirectory)/package-lock.json' | ||
| restoreKeys: | | ||
| "node_modules" | "$(Agent.OS)" |
There was a problem hiding this comment.
The cache key doesn’t include the Node.js major version. If the UseNode@1 version is updated (or varies between jobs), restoring node_modules from an older Node version can cause subtle failures for packages with native components/optional binaries. Consider including the Node version (e.g., a node24 segment or a variable) in the key/restoreKeys.
| key: '"node_modules" | "$(Agent.OS)" | $(Build.SourcesDirectory)/package-lock.json' | |
| restoreKeys: | | |
| "node_modules" | "$(Agent.OS)" | |
| key: '"node_modules" | "$(Agent.OS)" | "node24" | $(Build.SourcesDirectory)/package-lock.json' | |
| restoreKeys: | | |
| "node_modules" | "$(Agent.OS)" | "node24" |
|
@copilot fix the issues |
… cache-dependency-path Agent-Logs-Url: https://github.com/dotnet/aspnetcore/sessions/5888b402-1b87-4de3-867f-e5f8933eaf2a Co-authored-by: javiercn <6995051+javiercn@users.noreply.github.com>
Fixed all review issues in d384d17:
Fixed all the issues from the review in commit d384d17:
|
|
@copilot we are getting ##[error]eng\Npm.Workspace.nodeproj(140,5): error MSB3073: (NETCORE_ENGINEERING_TELEMETRY=Restore) The command "npm ci" exited with code 57005. from one of the builds. Investigate if there is a way we can make the npm output more visible |
Agent-Logs-Url: https://github.com/dotnet/aspnetcore/sessions/af088dc7-3aa7-4ada-9090-cd3b336c11da Co-authored-by: javiercn <6995051+javiercn@users.noreply.github.com>
Added |
| displayName: Install Node 24.x | ||
| inputs: | ||
| version: 24.x | ||
| - task: Cache@2 |
There was a problem hiding this comment.
The previous step picks 24.x, which means whatever latest of 24.x. Are we risking getting a new 24.x released but using old cache? Do we even care (e.g, could we miss a potential security fix from node)?
Is it better to specify the full complete version in both the install step and the cache step (as cache key)? So:
- We ensure getting a consistent behavior (i.e, a new node version cannot suddenly break us)
- If node released any security fixes, we will update the version both in the cache key and the install step and will ensure we don't use outdated cache.
There was a problem hiding this comment.
I don't think we have to worry about that. It would be extremely rare and severe if picking up a new node/npm broke the package-lock.json format.
npm ci essentially reads the package-lock.json format and replicates the structure on disk, so I don't think this is something we need to be concerned about.
wtgodbe
left a comment
There was a problem hiding this comment.
LGTM other than Youssef's question. FWIW, I'm looking into the flaky "Could not copy" error in CI now
The npm ci command intermittently fails with exit code 57005 (0xDEAD) in CI, producing zero diagnostic output. The .npmrc retry settings and ConsoleToMsBuild added in dotnet#66430 did not help because npm crashes before writing any error output. Add pre-flight diagnostics (node/npm version, npm config, disk space) so the environment is captured even on silent crashes. Switch to --loglevel verbose so npm logs each network request before the crash point. Capture the npm debug log file after failure instead of immediately aborting, since npm writes detailed logs to .npm/_logs/ even on crash. Relates to dotnet#62807 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The npm ci command intermittently fails with exit code 57005 (0xDEAD) in CI, producing zero diagnostic output. The .npmrc retry settings and ConsoleToMsBuild added in dotnet#66430 did not help because npm crashes before writing any error output. Add pre-flight diagnostics (node/npm version, npm config, disk space) so the environment is captured even on silent crashes. Switch to --loglevel verbose so npm logs each network request before the crash point. Capture the npm debug log file after failure instead of immediately aborting, since npm writes detailed logs to .npm/_logs/ even on crash. Relates to dotnet#62807 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The npm ci command intermittently fails with exit code 57005 (0xDEAD) in CI, producing zero diagnostic output. The .npmrc retry settings and ConsoleToMsBuild added in dotnet#66430 did not help because npm crashes before writing any error output. Add pre-flight diagnostics (node/npm version, npm config, disk space) so the environment is captured even on silent crashes. Switch to --loglevel verbose so npm logs each network request before the crash point. Capture the npm debug log file after failure instead of immediately aborting, since npm writes detailed logs to .npm/_logs/ even on crash. Relates to dotnet#62807 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The npm ci command intermittently fails with exit code 57005 (0xDEAD) in CI, producing zero diagnostic output. The .npmrc retry settings and ConsoleToMsBuild added in dotnet#66430 did not help because npm crashes before writing any error output. Add pre-flight diagnostics (node/npm version, npm config, disk space) so the environment is captured even on silent crashes. Switch to --loglevel verbose so npm logs each network request before the crash point. Capture the npm debug log file after failure instead of immediately aborting, since npm writes detailed logs to .npm/_logs/ even on crash. Relates to dotnet#62807 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The npm ci command intermittently fails with exit code 57005 (0xDEAD) in CI, producing zero diagnostic output. The .npmrc retry settings and ConsoleToMsBuild added in dotnet#66430 did not help because npm crashes before writing any error output. Add pre-flight diagnostics (node/npm version, npm config, disk space) so the environment is captured even on silent crashes. Switch to --loglevel verbose so npm logs each network request before the crash point. Capture the npm debug log file after failure instead of immediately aborting, since npm writes detailed logs to .npm/_logs/ even on crash. Relates to dotnet#62807 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The npm ci command intermittently fails with exit code 57005 (0xDEAD) in CI, producing zero diagnostic output. The .npmrc retry settings and ConsoleToMsBuild added in #66430 did not help because npm crashes before writing any error output. Add pre-flight diagnostics (node/npm version, npm config, disk space) so the environment is captured even on silent crashes. Switch to --loglevel verbose so npm logs each network request before the crash point. Capture the npm debug log file after failure instead of immediately aborting, since npm writes detailed logs to .npm/_logs/ even on crash. Relates to #62807 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
npm's default fetch retry/timeout settings are too aggressive for CI environments, causing flaky package installs.
node_modulesis also restored from scratch on every CI run with no caching. Additionally, npm output was not visible in CI logs, making failures difficult to diagnose.Description
.npmrc— Added npm fetch retry/timeout knobs applied globally to allnpm ci/npm installinvocations:fetch-retries=5(↑ from 2)fetch-retry-factor=2(↓ from 10 — less aggressive backoff)fetch-retry-maxtimeout=120000(↑ from 60000)fetch-retry-mintimeout=10000(unchanged from default, explicit)fetch-timeout=600000(↑ from 300000)default-build.yml— AddedCache@2task afterUseNode@1in both public and official build template blocks; cachesnode_moduleskeyed by"node_modules" | "node24" | OS | hash(package-lock.json). The Node major version is included in the key to prevent cross-version cache restores. MSBuild's incremental build check (node_modules\.package-lock.jsonmarker) automatically skipsnpm cion cache hit.components-e2e-tests.yml— AddedCache@2before the explicitnpm cistep;npm ciis conditioned onNODE_MODULES_CACHE_HIT != 'true'. Cache key includes"node24"to scope by Node major version.validate-npm-package-lock-json.yml— Removed--fetch-retries 5CLI flag (covered by.npmrc).cache: 'npm'is intentionally omitted to preserve the workflow's purpose of validating that packages are actually retrievable from the feed.update-jquery-validate.yml— Addedcache: 'npm'withcache-dependency-path: src/Mvc/build/package-lock.jsontoactions/setup-node, scoping the cache key to the lockfile for the packages actually being installed.Npm.Workspace.nodeproj— AddedConsoleToMsBuild="true"to all npmExectasks (npm ci,npm run build,npm run test) so that npm stdout/stderr is routed through MSBuild's logging pipeline and visible in CI logs.Npm.Workspace.FunctionalTests.nodeproj— AddedConsoleToMsBuild="true"to thenpm run integration-testExectask for the same reason.You've read the Contributor Guide and Code of Conduct.
You've included unit or integration tests for your change, where applicable.
You've included inline docs for your change, where applicable.
There's an open issue for the PR that you are making. If you'd like to propose a new feature or change, please open an issue to discuss the change or find an existing issue.