Skip to content

[No QA] fix(ci/test): set node --max-old-space-size=8192 and jest --maxWorker=2#75553

Merged
roryabraham merged 4 commits intoExpensify:mainfrom
gelocraft:fix-test-workflow-failure-problem
Nov 20, 2025
Merged

[No QA] fix(ci/test): set node --max-old-space-size=8192 and jest --maxWorker=2#75553
roryabraham merged 4 commits intoExpensify:mainfrom
gelocraft:fix-test-workflow-failure-problem

Conversation

@gelocraft
Copy link
Contributor

@gelocraft gelocraft commented Nov 19, 2025

Explanation of Change

  • 8 GB × 2 workers = 16 GB per shard (this should fix the heap allocation limit problem)
  • double the number of workflow runners from 4 to 8 to shard further.

Fixed Issues

$ #75488
PROPOSAL: #75488 (comment)

Tests

  • Verify that no errors appear in the JS console

Offline tests

QA Steps

// TODO: These must be filled out, or the issue title must include "[No QA]."

  • Verify that no errors appear in the JS console

PR Author Checklist

  • I linked the correct issue in the ### Fixed Issues section above
  • I wrote clear testing steps that cover the changes made in this PR
    • I added steps for local testing in the Tests section
    • I added steps for the expected offline behavior in the Offline steps section
    • I added steps for Staging and/or Production testing in the QA steps section
    • I added steps to cover failure scenarios (i.e. verify an input displays the correct error message if the entered data is not correct)
    • I turned off my network connection and tested it while offline to ensure it matches the expected behavior (i.e. verify the default avatar icon is displayed if app is offline)
    • I tested this PR with a High Traffic account against the staging or production API to ensure there are no regressions (e.g. long loading states that impact usability).
  • I included screenshots or videos for tests on all platforms
  • I ran the tests on all platforms & verified they passed on:
    • Android: Native
    • Android: mWeb Chrome
    • iOS: Native
    • iOS: mWeb Safari
    • MacOS: Chrome / Safari
    • MacOS: Desktop
  • I verified there are no console errors (if there's a console error not related to the PR, report it or open an issue for it to be fixed)
  • I verified there are no new alerts related to the canBeMissing param for useOnyx
  • I followed proper code patterns (see Reviewing the code)
    • I verified that any callback methods that were added or modified are named for what the method does and never what callback they handle (i.e. toggleReport and not onIconClick)
    • I verified that comments were added to code that is not self explanatory
    • I verified that any new or modified comments were clear, correct English, and explained "why" the code was doing something instead of only explaining "what" the code was doing.
    • I verified any copy / text shown in the product is localized by adding it to src/languages/* files and using the translation method
      • If any non-english text was added/modified, I used JaimeGPT to get English > Spanish translation. I then posted it in #expensify-open-source and it was approved by an internal Expensify engineer. Link to Slack message:
    • I verified all numbers, amounts, dates and phone numbers shown in the product are using the localization methods
    • I verified any copy / text that was added to the app is grammatically correct in English. It adheres to proper capitalization guidelines (note: only the first word of header/labels should be capitalized), and is either coming verbatim from figma or has been approved by marketing (in order to get marketing approval, ask the Bug Zero team member to add the Waiting for copy label to the issue)
    • I verified proper file naming conventions were followed for any new files or renamed files. All non-platform specific files are named after what they export and are not named "index.js". All platform-specific files are named for the platform the code supports as outlined in the README.
    • I verified the JSDocs style guidelines (in STYLE.md) were followed
  • If a new code pattern is added I verified it was agreed to be used by multiple Expensify engineers
  • I followed the guidelines as stated in the Review Guidelines
  • I tested other components that can be impacted by my changes (i.e. if the PR modifies a shared library or component like Avatar, I verified the components using Avatar are working as expected)
  • I verified all code is DRY (the PR doesn't include any logic written more than once, with the exception of tests)
  • I verified any variables that can be defined as constants (ie. in CONST.ts or at the top of the file that uses the constant) are defined as such
  • I verified that if a function's arguments changed that all usages have also been updated correctly
  • If any new file was added I verified that:
    • The file has a description of what it does and/or why is needed at the top of the file if the code is not self explanatory
  • If a new CSS style is added I verified that:
    • A similar style doesn't already exist
    • The style can't be created with an existing StyleUtils function (i.e. StyleUtils.getBackgroundAndBorderStyle(theme.componentBG))
  • If new assets were added or existing ones were modified, I verified that:
    • The assets are optimized and compressed (for SVG files, run npm run compress-svg)
    • The assets load correctly across all supported platforms.
  • If the PR modifies code that runs when editing or sending messages, I tested and verified there is no unexpected behavior for all supported markdown - URLs, single line code, code blocks, quotes, headings, bold, strikethrough, and italic.
  • If the PR modifies a generic component, I tested and verified that those changes do not break usages of that component in the rest of the App (i.e. if a shared library or component like Avatar is modified, I verified that Avatar is working as expected in all cases)
  • If the PR modifies a component related to any of the existing Storybook stories, I tested and verified all stories for that component are still working as expected.
  • If the PR modifies a component or page that can be accessed by a direct deeplink, I verified that the code functions as expected when the deeplink is used - from a logged in and logged out account.
  • If the PR modifies the UI (e.g. new buttons, new UI components, changing the padding/spacing/sizing, moving components, etc) or modifies the form input styles:
    • I verified that all the inputs inside a form are aligned with each other.
    • I added Design label and/or tagged @Expensify/design so the design team can review the changes.
  • If a new page is added, I verified it's using the ScrollView component to make it scrollable when more elements are added to the page.
  • I added unit tests for any new feature or bug fix in this PR to help automatically prevent regressions in this user flow.
  • If the main branch was merged into this PR after a review, I tested again and verified the outcome was still expected according to the Test steps.

Screenshots/Videos

Android: Native
Android: mWeb Chrome
iOS: Native
iOS: mWeb Safari
MacOS: Chrome / Safari
MacOS: Desktop

@gelocraft gelocraft requested a review from a team as a code owner November 19, 2025 14:44
@melvin-bot melvin-bot bot requested review from aimane-chnaif and removed request for a team November 19, 2025 14:44
@melvin-bot
Copy link

melvin-bot bot commented Nov 19, 2025

@aimane-chnaif Please copy/paste the Reviewer Checklist from here into a new comment on this PR and complete it. If you have the K2 extension, you can simply click: [this button]

@github-actions
Copy link
Contributor

⚠️ This PR is possibly changing native code and/or updating libraries, it may cause problems with HybridApp. Please check if any patch updates are required in the HybridApp repo and run an AdHoc build to verify that HybridApp will not break. Ask Contributor Plus for help if you are not sure how to handle this. ⚠️

@gelocraft
Copy link
Contributor Author

@mountiny ready for merge

@gelocraft
Copy link
Contributor Author

no QA for this PR

@gelocraft gelocraft changed the title fix(ci/test): set --max-old-space-size=16384 [No QA] fix(ci/test): set --max-old-space-size=16384 Nov 19, 2025
@mountiny
Copy link
Contributor

All tests are failing

Copy link
Contributor

@aimane-chnaif aimane-chnaif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add [No QA] prefix to the title.

@gelocraft gelocraft force-pushed the fix-test-workflow-failure-problem branch from 390d5c5 to a4577fd Compare November 19, 2025 14:54
@gelocraft
Copy link
Contributor Author

All tests are failing

rerunning the test

if setting the NODE_OPTIONS in workflow.env doesn't work, then i will just export the NODE_OPTIONS directly to Jest tests run step:

      - name: Jest tests
        run: |
          export NODE_OPTIONS="--experimental-vm-modules --max-old-space-size=16384"
          npm test -- --silent --shard=${{ fromJSON(matrix.chunk) }}/${{ strategy.job-total }} --maxWorkers=4 --coverage --coverageDirectory=coverage/shard-${{ matrix.chunk }}

@gelocraft
Copy link
Contributor Author

All tests are failing

rerunning the test

if setting the NODE_OPTIONS in workflow.env doesn't work, then i will just export the NODE_OPTIONS directly to Jest tests run step:

      - name: Jest tests
        run: |
          export NODE_OPTIONS="--experimental-vm-modules --max-old-space-size=16384"
          npm test -- --silent --shard=${{ fromJSON(matrix.chunk) }}/${{ strategy.job-total }} --maxWorkers=4 --coverage --coverageDirectory=coverage/shard-${{ matrix.chunk }}

Ok looks like it fails. I will just go with this approach

@gelocraft gelocraft force-pushed the fix-test-workflow-failure-problem branch from a4577fd to a3d7a16 Compare November 19, 2025 15:08
@gelocraft
Copy link
Contributor Author

weird the test still fails. i will try to set the memory limit to 8GB and see if it works

@gelocraft gelocraft force-pushed the fix-test-workflow-failure-problem branch from a3d7a16 to 83dff1e Compare November 19, 2025 15:21
@gelocraft
Copy link
Contributor Author

It’s still failing. I think the problem comes from using sharding together with multiple workers because running all of these in parallel consumes a lot of memory.

I will try to decrease the number of workers to 2 and see if it works

@gelocraft gelocraft force-pushed the fix-test-workflow-failure-problem branch from 83dff1e to 7474a42 Compare November 19, 2025 15:52
@gelocraft
Copy link
Contributor Author

If using 2 workers per shard still fails, I’ll reduce it to 1 worker per shard.

@gelocraft gelocraft force-pushed the fix-test-workflow-failure-problem branch from 7474a42 to 5139c64 Compare November 19, 2025 16:02
@codecov
Copy link

codecov bot commented Nov 19, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
see 14 files with indirect coverage changes

@gelocraft
Copy link
Contributor Author

@mountiny all test are passing I'll just amend my commit.

@gelocraft gelocraft force-pushed the fix-test-workflow-failure-problem branch from 5139c64 to 2b33489 Compare November 19, 2025 16:17
@gelocraft gelocraft changed the title [No QA] fix(ci/test): set --max-old-space-size=16384 [No QA] fix(ci/test): set node --max-old-space-size=8192 and jest --maxWorker=2 Nov 19, 2025
@gelocraft
Copy link
Contributor Author

gelocraft commented Nov 19, 2025

8 GB × 2 workers = 16 GB per shard (fully utilizing the maximum ram of ubuntu-latest runner)

@gelocraft
Copy link
Contributor Author

8 GB × 2 workers = 16 GB per shard (fully utilizing the maximum ram of ubuntu-latest runner)

If heap allocation limit problem still arise in the future. We can just increase the memory limit to 16GB and decrease the shard worker to 1. (I'm willing to open a follow-up PR.)

16 GB x 1 workers = 16 GB per shard

@gelocraft
Copy link
Contributor Author

And the reason why I’m keeping 2 workers per shard instead of 1 because it makes the workflow finish faster.

@gelocraft
Copy link
Contributor Author

@aimane-chnaif @mountiny

All tests are passing. Just waiting for the ESLint check to complete, then it’ll be ready to merge

@aimane-chnaif
Copy link
Contributor

Reviewer Checklist

  • I have verified the author checklist is complete (all boxes are checked off).
  • I verified the correct issue is linked in the ### Fixed Issues section above
  • I verified testing steps are clear and they cover the changes made in this PR
    • I verified the steps for local testing are in the Tests section
    • I verified the steps for Staging and/or Production testing are in the QA steps section
    • I verified the steps cover any possible failure scenarios (i.e. verify an input displays the correct error message if the entered data is not correct)
    • I turned off my network connection and tested it while offline to ensure it matches the expected behavior (i.e. verify the default avatar icon is displayed if app is offline)
  • I checked that screenshots or videos are included for tests on all platforms
  • I included screenshots or videos for tests on all platforms
  • I verified that the composer does not automatically focus or open the keyboard on mobile unless explicitly intended. This includes checking that returning the app from the background does not unexpectedly open the keyboard.
  • I verified tests pass on all platforms & I tested again on:
    • Android: HybridApp
    • Android: mWeb Chrome
    • iOS: HybridApp
    • iOS: mWeb Safari
    • MacOS: Chrome / Safari
    • MacOS: Desktop
  • If there are any errors in the console that are unrelated to this PR, I either fixed them (preferred) or linked to where I reported them in Slack
  • I verified there are no new alerts related to the canBeMissing param for useOnyx
  • I verified proper code patterns were followed (see Reviewing the code)
    • I verified that any callback methods that were added or modified are named for what the method does and never what callback they handle (i.e. toggleReport and not onIconClick).
    • I verified that comments were added to code that is not self explanatory
    • I verified that any new or modified comments were clear, correct English, and explained "why" the code was doing something instead of only explaining "what" the code was doing.
    • I verified any copy / text shown in the product is localized by adding it to src/languages/* files and using the translation method
    • I verified all numbers, amounts, dates and phone numbers shown in the product are using the localization methods
    • I verified any copy / text that was added to the app is grammatically correct in English. It adheres to proper capitalization guidelines (note: only the first word of header/labels should be capitalized), and is either coming verbatim from figma or has been approved by marketing (in order to get marketing approval, ask the Bug Zero team member to add the Waiting for copy label to the issue)
    • I verified proper file naming conventions were followed for any new files or renamed files. All non-platform specific files are named after what they export and are not named "index.js". All platform-specific files are named for the platform the code supports as outlined in the README.
    • I verified the JSDocs style guidelines (in STYLE.md) were followed
  • If a new code pattern is added I verified it was agreed to be used by multiple Expensify engineers
  • I verified that this PR follows the guidelines as stated in the Review Guidelines
  • I verified other components that can be impacted by these changes have been tested, and I retested again (i.e. if the PR modifies a shared library or component like Avatar, I verified the components using Avatar have been tested & I retested again)
  • I verified all code is DRY (the PR doesn't include any logic written more than once, with the exception of tests)
  • I verified any variables that can be defined as constants (ie. in CONST.ts or at the top of the file that uses the constant) are defined as such
  • If a new component is created I verified that:
    • A similar component doesn't exist in the codebase
    • All props are defined accurately and each prop has a /** comment above it */
    • The file is named correctly
    • The component has a clear name that is non-ambiguous and the purpose of the component can be inferred from the name alone
    • The only data being stored in the state is data necessary for rendering and nothing else
    • For Class Components, any internal methods passed to components event handlers are bound to this properly so there are no scoping issues (i.e. for onClick={this.submit} the method this.submit should be bound to this in the constructor)
    • Any internal methods bound to this are necessary to be bound (i.e. avoid this.submit = this.submit.bind(this); if this.submit is never passed to a component event handler like onClick)
    • All JSX used for rendering exists in the render method
    • The component has the minimum amount of code necessary for its purpose, and it is broken down into smaller components in order to separate concerns and functions
  • If any new file was added I verified that:
    • The file has a description of what it does and/or why is needed at the top of the file if the code is not self explanatory
  • If a new CSS style is added I verified that:
    • A similar style doesn't already exist
    • The style can't be created with an existing StyleUtils function (i.e. StyleUtils.getBackgroundAndBorderStyle(theme.componentBG)
  • If the PR modifies code that runs when editing or sending messages, I tested and verified there is no unexpected behavior for all supported markdown - URLs, single line code, code blocks, quotes, headings, bold, strikethrough, and italic.
  • If the PR modifies a generic component, I tested and verified that those changes do not break usages of that component in the rest of the App (i.e. if a shared library or component like Avatar is modified, I verified that Avatar is working as expected in all cases)
  • If the PR modifies a component related to any of the existing Storybook stories, I tested and verified all stories for that component are still working as expected.
  • If the PR modifies a component or page that can be accessed by a direct deeplink, I verified that the code functions as expected when the deeplink is used - from a logged in and logged out account.
  • If the PR modifies the UI (e.g. new buttons, new UI components, changing the padding/spacing/sizing, moving components, etc) or modifies the form input styles:
    • I verified that all the inputs inside a form are aligned with each other.
    • I added Design label and/or tagged @Expensify/design so the design team can review the changes.
  • If a new page is added, I verified it's using the ScrollView component to make it scrollable when more elements are added to the page.
  • For any bug fix or new feature in this PR, I verified that sufficient unit tests are included to prevent regressions in this flow.
  • If the main branch was merged into this PR after a review, I tested again and verified the outcome was still expected according to the Test steps.
  • I have checked off every checkbox in the PR reviewer checklist, including those that don't apply to this PR.

Screenshots/Videos

Android: HybridApp
Android: mWeb Chrome
iOS: HybridApp
iOS: mWeb Safari
MacOS: Chrome / Safari
MacOS: Desktop

@gelocraft gelocraft force-pushed the fix-test-workflow-failure-problem branch from 6760805 to 2335bfb Compare November 19, 2025 19:28
@gelocraft
Copy link
Contributor Author

gelocraft commented Nov 19, 2025

@gelocraft please merge main. It's fixed now.

Done rebasing my branch onto upstream/main.

Waiting for all the checks to finish

Copy link
Contributor

@roryabraham roryabraham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these changes are problematic for a few reasons:

  1. Removing the node options from package.json means that they won't be applied if someone runs npm test locally
  2. Reducing the number of parallel workers will slow down tests. Why not shard further and add another runner instead?
  3. Overall it seems like we're not making strong efforts to address the root cause of very high memory usage tests.

@gelocraft
Copy link
Contributor Author

I think these changes are problematic for a few reasons:

  1. Removing the node options from package.json means that they won't be applied if someone runs npm test locally
  2. Reducing the number of parallel workers will slow down tests. Why not shard further and add another runner instead?
  3. Overall it seems like we're not making strong efforts to address the root cause of very high memory usage tests.

@roryabraham

so the new changes would be:

  • add back the NODE_OPTIONS in package.json and set --max-old-space-size to 8GB because 4GB is no longer enough.
  • use more runners (double it from 4 to 8) and shard further, we can try 8GB limit + 2 workers per shard to utilize all 16GB ram of ubuntu-latest for public repository

@gelocraft
Copy link
Contributor Author

gelocraft commented Nov 19, 2025

  1. Reducing the number of parallel workers will slow down tests. Why not shard further and add another runner instead?

I reduced the number of Jest workers because we need to increase the memory allocation to 8GB, and we have to make sure we don’t exceed the RAM limits of the ubuntu-latest runner.

Explanation:

  • With 4 workers, an 8GB heap per worker would require 32GB of memory, which is far above what the runner provides.

  • With 2 workers, the total would be 16GB, which stays within the runner’s maximum RAM.

@gelocraft
Copy link
Contributor Author

  • add back the NODE_OPTIONS in package.json and set --max-old-space-size to 8GB because 4GB is no longer enough.
  • use more runners (double it from 4 to 8) and shard further, we can try 8GB limit + 2 workers per shard to utilize all 16GB ram of ubuntu-latest for public repository

I will try to implement this

@roryabraham
Copy link
Contributor

Does --max-old-space-size manage memory per worker or for the parent node process?

@gelocraft
Copy link
Contributor Author

Does --max-old-space-size manage memory per worker or for the parent node process?

yes, and both. because node is the parent process that spawns jest process

@roryabraham
Copy link
Contributor

This seems like a reasonable plan then:

  • set --max-old-space-size to 8GB
  • double the number of shards/parallel runners from 4 to 8
  • halve the --max-workers to 2

But still, I'd love to see some investigation into why tests are being so memory consumptive. That plan only really makes sense if tests across the board are similarly memory-hungry. If we have one or two "bad apple" tests that are consuming lots of memory, then we're just moving the goalpost from 4GB to 8GB rather than fixing the root problem.

@gelocraft gelocraft force-pushed the fix-test-workflow-failure-problem branch from 8287b5e to 3379941 Compare November 19, 2025 20:17
@gelocraft
Copy link
Contributor Author

But still, I'd love to see some investigation into why tests are being so memory consumptive. That plan only really makes sense if tests across the board are similarly memory-hungry. If we have one or two "bad apple" tests that are consuming lots of memory, then we're just moving the goalpost from 4GB to 8GB rather than fixing the root problem.

Maybe because the codebase is so large. ESLint checks in this codebase also uses 8GB of memory because 4GB is not enough and it will cause heap allocation limit problem if u set the --max-old-space-size to 4GB.

@gelocraft
Copy link
Contributor Author

How about we shard further to 16 runners to make the test workflow even faster if the test run is not fast enough?

@roryabraham
Copy link
Contributor

I don't think "throw more resources at it" is the most elegant solution proposal, when we don't really understand the root cause of the problem.

@gelocraft
Copy link
Contributor Author

@roryabraham thanks for taking the time to review this PR, I really appreciate it.

@gelocraft
Copy link
Contributor Author

All checks are passing, and this PR will resolve the recent Jest workflow test failures.

const TEST_AUTH_TOKEN_2 = 'zxcvbnm';

jest.setTimeout(60000);
jest.setTimeout(120000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this change for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to fixed the exceeded timeout error in https://github.com/Expensify/App/actions/runs/19515137212/job/55864876727

i increased the timeout time from 1min to 2mins

1min = 60000ms
2min = 120000ms

@roryabraham roryabraham merged commit cae1e1a into Expensify:main Nov 20, 2025
40 checks passed
@melvin-bot
Copy link

melvin-bot bot commented Nov 20, 2025

Congrats, that's your 5th PR merged! 🎉 Do you know about the ContributorPlus role? It's an opportunity to earn more in the Expensify Open Source community. Keep up the great work - thanks!

@OSBotify
Copy link
Contributor

✋ This PR was not deployed to staging yet because QA is ongoing. It will be automatically deployed to staging after the next production release.

@OSBotify
Copy link
Contributor

🚀 Deployed to staging by https://github.com/roryabraham in version: 9.2.62-0 🚀

platform result
🖥 desktop 🖥 success ✅
🕸 web 🕸 success ✅
🤖 android 🤖 success ✅
🍎 iOS 🍎 success ✅

@OSBotify
Copy link
Contributor

🚀 Deployed to production by https://github.com/marcaaron in version: 9.2.62-5 🚀

platform result
🖥 desktop 🖥 success ✅
🕸 web 🕸 success ✅
🤖 android 🤖 success ✅
🍎 iOS 🍎 success ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants